Search CORE

19 research outputs found

Enabling Interactive Analytics of Secure Data using Cloud Kotta

Author: Babuji Yadu N.
Chard Kyle
Duede Eamon
Publication venue
Publication date: 28/04/2017
Field of study

Research, especially in the social sciences and humanities, is increasingly reliant on the application of data science methods to analyze large amounts of (often private) data. Secure data enclaves provide a solution for managing and analyzing private data. However, such enclaves do not readily support discovery science---a form of exploratory or interactive analysis by which researchers execute a range of (sometimes large) analyses in an iterative and collaborative manner. The batch computing model offered by many data enclaves is well suited to executing large compute tasks; however it is far from ideal for day-to-day discovery science. As researchers must submit jobs to queues and wait for results, the high latencies inherent in queue-based, batch computing systems hinder interactive analysis. In this paper we describe how we have augmented the Cloud Kotta secure data enclave to support collaborative and interactive analysis of sensitive data. Our model uses Jupyter notebooks as a flexible analysis environment and Python language constructs to support the execution of arbitrary functions on private data within this secure framework.Comment: To appear in Proceedings of Workshop on Scientific Cloud Computing, Washington, DC USA, June 2017 (ScienceCloud 2017), 7 page

arXiv.org e-Print Archive

Crossref

Cloud Kotta: Enabling Secure and Scalable Data Analytics in the Cloud

Author: Babuji Yadu
Chard Kyle
Duede Eamon
Gerow Aaron
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/02/2016
Field of study

Distributed communities of researchers rely increasingly on valuable, proprietary, or sensitive datasets. Given the growth of such data, especially in fields new to data-driven research like the social sciences and humanities, coupled with what are often strict and complex data-use agreements, many research communities now require methods that allow secure, scalable and cost-effective storage and analysis. Here we present CLOUD KOTTA: a cloud-based data management and analytics framework. CLOUD KOTTA delivers an end-to-end solution for coordinating secure access to large datasets, and an execution model that provides both automated infrastructure scaling and support for executing analytics near to the data. CLOUD KOTTA implements a fine-grained security model ensuring that only authorized users may access, analyze, and download protected data. It also implements automated methods for acquiring and configuring low-cost storage and compute resources as they are needed. We present the architecture and implementation of CLOUD KOTTA and demonstrate the advantages it provides in terms of increased performance and flexibility. We show that CLOUD KOTTA’s elastic provisioning model can reduce costs by up to 16x when compared with statically provisioned models

arXiv.org e-Print Archive

Goldsmiths Research Online

Crossref

The Changing Role of RSEs over the Lifetime of Parsl

Author: Babuji Yadu
Chard Kyle
Clifford Ben
Katz Daniel S.
Kesling Kevin Hunter
Woodard Anna
Publication venue
Publication date: 20/07/2023
Field of study

This position paper describes the Parsl open source research software project and its various phases over seven years. It defines four types of research software engineers (RSEs) who have been important to the project in those phases; we believe this is also applicable to other research software projects.Comment: 3 page

arXiv.org e-Print Archive

Developing Distributed High-performance Computing Capabilities of an Open Science Platform for Robust Epidemic Analysis

Author: Babuji Yadu
Binois Mickaël
Chard Kyle
Collier Nicholson
Fadikar Arindam
Ozik Jonathan
Stevens Abby
Wozniak Justin M.
Würth Alexandra
Publication venue
Publication date: 10/05/2023
Field of study

COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among domain experts, mathematical modelers, and scientific computing specialists. Computationally, however, it also revealed critical gaps in the ability of researchers to exploit advanced computing systems. These challenging areas include gaining access to scalable computing systems, porting models and workflows to new systems, sharing data of varying sizes, and producing results that can be reproduced and validated by others. Informed by our team's work in supporting public health decision makers during the COVID-19 pandemic and by the identified capability gaps in applying high-performance computing (HPC) to the modeling of complex social systems, we present the goals, requirements, and initial implementation of OSPREY, an open science platform for robust epidemic analysis. The prototype implementation demonstrates an integrated, algorithm-driven HPC workflow architecture, coordinating tasks across federated HPC resources, with robust, secure and automated access to each of the resources. We demonstrate scalable and fault-tolerant task execution, an asynchronous API to support fast time-to-solution algorithms, an inclusive, multi-language approach, and efficient wide-area data management. The example OSPREY code is made available on a public repository

arXiv.org e-Print Archive

A Composition-Transferable Machine Learning Potential for LiCl-KCl Molten Salts Validated by HEXRD

Author: Chris Benmore
ganesh sivaraman
Ian Foster
Jicheng Guo
Logan Ward
Mark Williamson
Nathaniel Hoyt
Nicholas Jackson
Yadu Babuji
Publication venue
Publication date: 04/02/2022
Field of study

Unraveling the liquid structure of multi-component molten salts is challenging due to the difficulty in conducting and interpreting high temperature diffraction experiments. Motivated by this challenge, we developed composition-transferable Gaussian Approximation Potentials (GAP) for molten LiCl-KCl. A DFT-SCAN accurate GAP is active learned from only ~1100 training configurations drawn from 10 unique mixture compositions enriched with metadynamics. The GAP-computed structures show strong agreement across HEXRD experiments, including for a eutectic not explicitly included in model training, thereby opening the possibility for composition discovery

ChemRxiv

DLHub: Model and Data Serving for Science

Author: Babuji Yadu
Blaiszik Ben
Chard Kyle
Chard Ryan
Foster Ian
Franklin Michael
Li Zhuozhao
Tuecke Steven
Ward Logan
Woodard Anna
Publication venue
Publication date: 26/11/2018
Field of study

While the Machine Learning (ML) landscape is evolving rapidly, there has been a relative lag in the development of the "learning systems" needed to enable broad adoption. Furthermore, few such systems are designed to support the specialized requirements of scientific ML. Here we present the Data and Learning Hub for science (DLHub), a multi-tenant system that provides both model repository and serving capabilities with a focus on science applications. DLHub addresses two significant shortcomings in current systems. First, its selfservice model repository allows users to share, publish, verify, reproduce, and reuse models, and addresses concerns related to model reproducibility by packaging and distributing models and all constituent components. Second, it implements scalable and low-latency serving capabilities that can leverage parallel and distributed computing resources to democratize access to published models through a simple web interface. Unlike other model serving frameworks, DLHub can store and serve any Python 3-compatible model or processing function, plus multiple-function pipelines. We show that relative to other model serving systems including TensorFlow Serving, SageMaker, and Clipper, DLHub provides greater capabilities, comparable performance without memoization and batching, and significantly better performance when the latter two techniques can be employed. We also describe early uses of DLHub for scientific applications. Comment: 10 pages, 8 figures, conference pape

Scipedia